Careers

←Job Openings

Hadoop Developer – San Jose, CA Refer a Person Apply for this Job

Job Description

Study the existing complex architecture of the all the different channels of the client's with multiple systems and make necessary recommendations and change the current architecture.
Work on entire development life cycle of Project involving Big Data applications and make decisions like choosing right big data development tools, best practices in setting up the cluster lever Big Data Eco-system and best software build/deployment practices by considering time constraints with the team.
Make sure all the TOPO has been enabled correctly in production config
Determine all the touchpoints in the project and determine the intent of the customer at each of the touch points and determine the best next action and connect the touch points to build the project
Work on shifts (US hours) to provide Hadoop Platform Support and Perform Administrative on Production Hadoop clusters.
Evaluate the elastic search cluster performance and work on extensive performance tuning for ES queries to be performed in client’s Project
Work on finding cluster level solutions for our complex system and developed enterprise level applications followed by unit testing.
Involve in Data transformation using complex Hive Queries language (HQL) and build a Data Lake which is used for Data Analysis. Run complex queries and work on Bucketing, Partitioning, Joins and sub-queries.
Work on modifying the existing configuration files in accordance with development requirements and deal with setting up the new Apache Hadoop components in the eco-system.
Work on complex Data cleanups, Data Extraction and Validations using Map Reduce & Hive.
Write applications in multiple languages like Scala, Java, Python.
Write Big Data Advanced business application programs code in both functional and object-oriented programming.
Develop standalone applications in Spark/Scala that reads error logs from multiple upstream data sources and run validations on it.
Write build scripts to build applications using tools like Apache Maven, Ant, Sbt and deploy the code using Jenkins for CI/CD.
Work on writing complex workflow jobs using Oozie and set up multiple programs scheduler system which helped in managing multiple Hadoop, Hive, Sqoop, Spark jobs.
Closely monitor the pipeline jobs and worked on failed jobs. Deal with setting up several new property configurations within Oozie.
Work on developing Kafka producers that listen to several streaming data with-in a specified duration.
Work on spark streaming which listens to multiple streaming data coming from several http web streams.
Perform data cleanups and validations on streaming data using spark, spark streaming and Scala.
Work on developing data ingestion scripts in Scala and python.
Build dashboards with Elastic Search components like Kibana and Logstash.

Required Skills:

A minimum of bachelor's degree in computer science or equivalent.
Cloudrea Hadoop(CDH), Cloudera Manager, Informatica Bigdata Edition(BDM), HDFS, Yarn, MapReduce, Hive, Impala, KUDU, Sqoop, Spark, Kafka, HBase, Teradata Studio Express, Teradata, Tableau, Kerberos, Active Directory, Sentry, TLS/SSL, Linux/RHEL, Unix Windows, SBT, Maven, Jenkins, Oracle, MS SQL Server, Shell Scripting, Eclipse IDE, Git, SVN
Must have strong problem-solving and analytical skills
Must have the ability to identify complex problems and review related information to develop and evaluate options and implement solutions.

If you are interested in working in a fast-paced, challenging, fun, entrepreneurial environment and would like to have the opportunity of being a part of this fascinating industry, Send resumes. to HSTechnologies LLC, 2801 W Parker Road, Suite #5 Plano, TX - 75023 or email your resume to hr@sbhstech.com.